home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Tech Arsenal 1
/
Tech Arsenal (Arsenal Computer).ISO
/
tek-01
/
pct1-b.zip
/
APP3.DOC
< prev
next >
Wrap
Text File
|
1990-08-02
|
29KB
|
717 lines
xxvii
APPENDIX III - INSTRUCTION SPEED AND FLAGS
This appendix contains a list of all the 8086 instructions along
with their relatives speed and the flags they affect. These speed
numbers are from INTEL and are in clock ticks.{1} If you have a
25 mhz clock, then it is doing 25 million ticks per second. If
you have a 4.77 mhz clock, then it is doing 4.77 million ticks
per second. Instead of calling them ticks, I'll be calling them
clocks.
In order to understand these numbers, you need a little extra
information. In order to do an instruction, ANY microprocessor
must:
(1) calculate any memory addresses.
(2) fetch the two operands (or the single operand if it is
an instruction like NOT).
(3) process the instruction and
(4) put the result in the destination.
While (3) will require the same amount of time whether the
operands are in memory or in registers, (1), (2) and (4) are all
different depending on where the operands are. Some machines
allow both operands to be in memory, but the 8086 does not.
Therefore, (1) is either NO memory addresses (if everything is in
registers) or ONE memory address.
Calculating a memory address involves (a) calculating the offset
from the beginning of the segment and (b) adding it to the
segment starting address. Part (a) is different depending on the
addressing mode. In terms of the speed of calculating (a) and
(b), the order is:
(i) one pointer
(ii) named variable (+ constant)
(iii) two pointers
(iv) one pointer + constant
(v) two pointers + constant
The reason that (ii) contains both "variable" and "variable +
constant" is that the addition (variable + constant) is done by
the assembler, not the 8086. By the time the 8086 sees it, it is
a single number. The constants in both (iv) and (v) need to be
added by the 8086, and this takes extra time. According to INTEL,
here is the time required for calculating all possible memory
addresses. Note that some pointers are marginally faster than
other pointers (this is trivial - don't worry about it).
____________________
1 All the speed numbers are from "Programmer's Pocket
Reference Guide", (c)1980, 1982 Intel Corporation.
______________________
The PC Assembler Tutor - Copyright (C) 1990 Chuck Nelson
The PC Assembler Tutor xxviii
______________________
ADDRESSING MODE CLOCKS (EA)
(i) [bx], [si], [di], [bp] 5
(ii) variable (+ constant) 6
(iii) [bp+di] or [bx+si] 7
[bp+si] or [bx+di] 8
(iv) ([bx] or [si] or [di] or [bp]) + constant 9
(v) ([bp+di] or [bx+si]) + constant 11
([bp+si] or [bx+di]) + constant 12
The most complicated memory address takes 2.4 times longer to
calculate than the simplest address. These calculation times will
be noted by EA (calculate Effective Address). Remember, if both
operands are in registers, this calculation does not have to be
done.
In order for you to see how all of this works, we'll use ADD as
an example. Don't start using the table till you understand this
example.
On the 8086, we normaly have the following possibilities for
source and destination:
register, register
register, memory
memory, register
register, constant
memory, constant
In Appendix II, we simply combined them as:
reg/mem, reg/mem
reg/mem, constant
We can't do this here because they all have different times. For
ADD they are:
ADD CLOCKS
register, register 3
register, memory 9 + EA
memory, register 16 + EA
register, constant 4
memory, constant 17 + EA
Notice how much faster using a register is. The EA stands for
"calculate Effective Address" and is the number from the list
above. If you have:
add ax, bx
that is "register, register", and it will take 3 clocks to
execute. If you have:
add [bx+di+9], 17
that is "memory, constant" and will take 17 + EA. What is EA
here? According to the above list, [BX+DI+CONSTANT] takes 12
Appendix III - Speeds and Flag Settings xxix
_______________________________________
cycles, so 17 + EA is 17 + 12 is 29, so this will take 29 clocks.
That's right. The one instruction is almost 10 times slower than
the other. If you can move things into some registers, do a
number of calculations, and then move them back to memory, you
can save a lot of time. Let's do a few examples to make sure that
you see all of them:
INSTRUCTION TYPE TIME
add variable1, bl memory, register 16 + EA = 22
add bl, variable1 register, memory 9 + EA = 15
add [si], di memory, register 16 + EA = 21
add di, [si] register, memory 9 + EA = 14
Examples 1 and 2 are the same except that source and destination
have been switched. The same applies to examples 3 and 4. Notice
that when the 8086 has to fetch the variable and then put the
result back in memory, it is significantly slower than when it
just gets the variable from memory and puts the result in a
register.
INSTRUCTION TYPE TIME
add ax, cx register, register 3
add di, 1876 register, constant 4
add variable1, 199 memory, constant 17 + EA = 23
To show you how the different types of EA effect the time, let's
do all 3 types of "source, destination" that involve memory.
First, "memory, register":
INSTRUCTION TIME
add [bx], ax 16 + EA = 21
add variable1, ax 16 + EA = 22
add [bp+di], ax 16 + EA = 23
add [bp+si], ax 16 + EA = 24
add [bx+9], ax 16 + EA = 25
add [bp+di+294], ax 16 + EA = 27
add [bx+di+294], ax 16 + EA = 28
Now let's do the same things but go "register, memory":
INSTRUCTION TIME
add ax, [bx] 9 + EA = 14
add ax, variable1 9 + EA = 15
add ax, [bp+di] 9 + EA = 16
add ax, [bp+si] 9 + EA = 17
add ax, [bx+9] 9 + EA = 18
add ax, [bp+di+294] 9 + EA = 20
add ax, [bx+di+294] 9 + EA = 21
The PC Assembler Tutor xxx
______________________
And finally we have "memory, constant":
INSTRUCTION TIME
add [bx], 177 17 + EA = 22
add variable1, 177 17 + EA = 23
add [bp+di], 177 17 + EA = 24
add [bp+si], 177 17 + EA = 25
add [bx+9], 177 17 + EA = 26
add [bp+di+294], 177 17 + EA = 28
add [bx+di+294], 177 17 + EA = 29
Is this everything you need to know before looking at the list?
Not quite. Most of the 8086 family has a 16 bit data bus. That
means that there are 16 wires connecting the processor to memory,
and the processor reads 1 word (2 bytes) at a time. These memory
reads ALWAYS start at an even location 1472d, 88026d, 198752d,
etc. If you are reading one byte, it makes no difference whether
it is at an even or odd location. If you are reading a word at an
even location, then everything is normal. If you are reading a
word at an ODD location, however, the processor must:
1) start reading at the first even location that contains
the variable.
2) read the next even location (which contains the last part
of the variable).
3) join the parts together.
As an example, let's take a word at address 21957 (i.e. 21957-
21958). The processor will:
1) read the high byte from the word at 21956
2) read the low byte from the word at 21958
3) join them together. It now has 21957-21958.
The processor can do this, but it takes extra time (4 extra clock
ticks), so our speed listing will also contain the following
notice:
WORDS WHICH ARE AT ODD ADDRESSES NEED 4 EXTRA CLOCKS
Thus for:
add ax, variable1 (9 + EA = 15)
if the address is an even location, the instruction will require
15 clocks. If it is an odd location, the instruction will require
19 clocks (4 extra ticks). It is worth your while to keep words
at even locations if at all possible.
Appendix III - Speeds and Flag Settings xxxi
_______________________________________
INSTRUCTION SPEEDS AND FLAGS
REGISTER:
One of the arithmetic registers - either word (AX, BX, CX,
DX, SI, DI, BP, SP for word operations) or byte (AH, AL, BH,
BL, CH, CL, DH or DL for byte operations).
AX or AL:
AX and AL are considered special on the 8086. Sometimes
there is a special AX/AL form of the instruction, (such as
for ADD). This form will be shorter. It will only be noted
if it is FASTER (such as for MOV) or if it is the only form
allowed. It will either say (AX/AL) or (AX only) depending
on whether both words and bytes are allowed or only words
are allowed. Though both multiplication and division require
the use of the (AX/AL) register, the AX/AL is understood,
and is not mentioned in the instruction.
SEGREG:
One of the 4 segment registers - CS, DS, ES or SS.
MEMORY:
Either a byte or word in memory. It may be addressed with
any possible addressing mode, and the extra time needed to
calculate the address in memory (EA - calculate Effective
Address) is the following:
ADDRESSING MODE CLOCKS (EA)
(i) [bx], [si], [di], [bp] 5
(ii) variable (+ constant) 6
(iii) [bp+di] or [bx+si] 7
[bp+si] or [bx+di] 8
(iv) ([bx] or [si] or [di] or [bp]) + constant 9
(v) ([bp+di] or [bx+si]) + constant 11
([bp+si] or [bx+di]) + constant 12
EXTRA TIME:
1) WORDS WHICH ARE AT ODD ADDRESSES NEED 4 EXTRA CLOCKS.
2) A segment override adds 2 clocks to the instruction.
FLAGS
Following the instruction mnemonic is information in square
brackets that indicates how the instruction effects the flags
register. There are three possibilities
1) The instruction may alter the value of a flag in a
particular way depending on the result. For AND, the sign,
zero and parity flags [SZP] will be set according to the
result.
The PC Assembler Tutor xxxii
______________________
2) The instruction may set a flag to a specific number
(either 1 or 0). For AND, the overflow flag and the carry
flag are cleared [(OC=0)].
3) The instruction may unreliably alter a flag. That means
that the instruction might change the flag, but that it
gives you no information. This kind of flag cannot be
trusted after this operation. For AND, the auxillary flag is
unreliable [?A?].
The information will always be displayed in this order. The flags
which are reliably set according to the result will be listed
first [SZP]. Next, any flags which are either set or cleared will
be put inside parentheses followed by an equal sign followed by
the value of the flag (all inside of the parentheses)
[SZP,(OC=0)]. Finally, any unrelaible flags will be put between
question marks [SZP,(OC=0),?A?]. Each part will be separated by
commas. If no flags are changed by the instruction, the brackets
will have [none] written between them.
Each flags will be indicated by a single letter. The letters are:
O overflow flag 0 = no overflow, 1 = overflow
D direction flag direction of movement
for string instructions
0 = upwards, 1 = downwards
I interrupt enable 0 = no ints, 1 = ints o.k.
T trap flag trap next instruction?
0 = no trap, 1 = trap
S sign flag 0 = positive, 1 = negative
Z zero flag 0 = non-zero, 1 = zero
A auxillary flag carry out of bottom half
register? 0 = no, 1 = yes
P parity flag 0 = odd, 1 = even
C carry flag 0 = no carry, 1 = carry
***************** THE INSTRUCTIONS *********************
INSTRUCTION TIMING
AAA [AC,?OSZP?] 4 clocks
AAD [SZP,?OAC?] 60 clocks
AAM [SZP,?OAC?] 83 clocks
AAS [AC,?OSZP?] 4 clocks
Appendix III - Speeds and Flag Settings xxxiii
_______________________________________
ADC [OSZAPC] see ADD
ADD [OSZAPC] register, register 3
register, memory 9 + EA
memory, register 16 + EA
register, constant 4
memory, constant 17 + EA
AND [SZP,(OC=0),?A?] see ADD
CALL [none] near call 19
far call 28
near call (reg-ind) 16 {2}
near call (mem-ind) 21 + EA
far call (mem-ind) 37 + EA
CBW [none] 2 clocks
CLC [(C=0)] 2 clocks
CLD [(D=0)] 2 clocks
CLI [(I=0)] 2 clocks
CMC [C] 2 clocks
CMP [OSZAPC] register, register 3
register, memory 9 + EA
memory, register 9 + EA
register, constant 4
memory, constant 10 + EA
CMPS [OSZAPC] 22 clocks
CWD [none] 5 clocks
DAA [SZAPC,?O?] 4 clocks
DAS [SZAPC,?O?] 4 clocks
DEC [OSZAP] word register 2
byte register 3
word/byte memory 15 + EA
DIV [?OSZAPC?] byte register 80 - 90 {3}
word register 144 - 162
____________________
2 These three last ones are indirect calls. They get the
address of the subroutine from a register (reg-ind) or from
memory (mem-ind).
3 The smaller the numbers, the faster this operation can be
accomplished. This applies for signed and unsigned multiplication
and division. They all show a range of values rather than a
specific value.
The PC Assembler Tutor xxxiv
______________________
byte memory (86 - 96) + EA
word memory (150 - 168) + EA
ESC [none] memory 8 + EA
coproc. register 2
This only makes sense if you know
about coprocessors.
HLT [none] 2 clocks
IDIV [none] byte register 101 - 112
word register 165 - 184
byte memory (107 - 118) + EA
word memory (171 - 190) + EA
IMUL [OC,?SZAP?] byte register 80 - 98
word register 128 - 154
byte memory (86 - 104) + EA
word memory (134 - 160) + EA
IN [none] (AX/AL), port# 10
(AX/AL), dx 8
INC [OSCAP] word register 2
byte register 3
word/byte memory 15 + EA
INT [(IT=0) {4} ] 51 clocks {5}
INTO [ {6} ] overflow 54
no overflow 4
IRET [ {7} ] 24 clocks
J(condition) [none] This includes all conditional jumps
(JAE, JZ, JNO, JLE, JP, etc.) with the
exception of JCXZ.
jump 16
no jump 4
____________________
4 Although this sets the trap flag and interrupt flag to 0, it
doesn't do it to YOUR flags, it does it to the flags that the
interrupt will see. Your flags are safely stored on the stack and
will return unaltered at the end of the interrupt.
5 There is one exception. INT 3, when coded as the single byte
trap interrupt, is 52 clocks.
6 If there is overflow, it pushes your flags and sets (IT=0)
for the interrupt. If there was no overflow, it does nothing. In
either case your own flags will remain unaffected.
7 This puts the copy of your old flags back in the flags
register. The flags will be the same as they were when you called
the interrupt.
Appendix III - Speeds and Flag Settings xxxv
_______________________________________
JCXZ [none] jump 18
no jump 6
JMP [none] same segment 15
different segment 15
near (reg-ind) 11 {8}
near (mem-ind) 18 + EA
far (mem-ind) 24 + EA
LAHF [none] 4 clocks
LDS [none] 16 + EA
LES [none] 16 + EA
LEA [none] 2 + EA
LOCK [none] 2 clocks
LODS [none] 12 clocks
LOOP [none] jump 17
no jump 5
LOOPE/LOOPZ [none] jump 18
no jump 6
LOOPNE/LOOPNZ [none] jump 19
no jump 5
MOV [none] register, register 2
register, memory 8 + EA
memory, register 9 + EA
register, constant 4
memory, constant 10 + EA
(AX/AL) <-> memory 10 {9}
____________________
8 These last three are indirect jumps. The information about
where to jump to is coming from a register (reg-ind) or from
memory (mem-ind).
9 This is a special instruction which moves a directly
addressed variable to or from AX (or AL for bytes). Pointers are
not allowed, only the forms:
mov ax, variable1
mov variable1, ax
This takes 10 clocks instead of 14 or 15 for the other form.
Whether this form gets used is up to the assembler, not you.
Fortunately, MASM, TurboAssembler and A86 all use this form when
appropriate.
The PC Assembler Tutor xxxvi
______________________
segreg <-> register 2 {10}
segreg, memory 8 + EA
memory, segreg 9 + EA
MOVS [none] 11 clocks
MUL [OC,?SZAP?] byte register 70 - 77
word register 118 - 133
byte memory (76 - 83) + EA
word memory (124 - 139) + EA
NEG [OSZAPC] {11} register 3
memory 16 + EA
NOP [none] 3 clocks
NOT [none] register 3
memory 16 + EA
OR [SZP,(OC=0),?A?] see ADD
OUT [none] (AX/AL), port# 10
(AX/AL), dx 8
POP [none] register 8
segreg 8
memory 17 + EA
POPF [ {12} ] 8 clocks
PUSH [none] register 11
segreg 10
memory 16 + EA
PUSHF [none] 10 clocks
RCL [OC] register by 1 bit 2
memory by 1 bit 15 + EA
register by # in CL 8 + (4 * #) {13}
memory by # in CL 20 + EA + (4 * #)
RCR [OC] see RCL
____________________
10 MOV, PUSH and POP are the only instructions that can alter
the segment registers (other than CALLs and JMPs).
11 (NEG number) sets the flags the same as (SUB 0, number).
12 POPF resets the flags register by POPping a word of the
stack and using the values stored in that word.
13 Thus, if you rotate right by 3 bits add (4 * 3), if you
rotate by 7 bits add (4 * 7), if you rotate by 2 bits add
(4 * 2). As you can see, this can cost a lot of time if you are
rotating more than 3 or 4 bits.
Appendix III - Speeds and Flag Settings xxxvii
_______________________________________
REP [none] 2 clocks
RET [none] near ret 8 {14}
near ret (#) 12
far ret 18
far ret (#) 17
ROL [OC] see RCL
ROR [OC] see RCL
SAHF [ {15} ] 4 clocks
SAL/SHL [OSZPC,?A?] see RCL
SAR [OSZPC,?A?] see RCL
SBB [OSZAPC] see ADD
SCAS [OSZAPC] 15 clocks
SEGMENT OVERRIDE [none] 2 clocks
SHR [OSZPC,?A?] see RCL
STC [(C=1)] 2 clocks
STD [(D=1)] 2 clocks
STI [(I=1)] 2 clocks
STOS [none] 11 clocks
SUB [OSZAPC] see ADD
TEST [SZP,(OC=0),?A?] register, register 3
register, memory 9 + EA
memory, register 9 + EA
(AX/AL), constant 4
register, constant 5
memory, constant 11 + EA
WAIT [none] 3 clocks minimum, then check every 5
clocks
XCHG [none] register, register 4
(AX only), register 3
____________________
14 The # here indicates that you pop things off the stack as
you would in a Pascal program:
ret (18)
ret (6)
15 Alters the values of the SZAPC flags according to the
values in the AH register.
The PC Assembler Tutor xxxviii
______________________
register, memory 17 + EA
XLAT [none] 11 clocks
XOR [SZP,(OC=0),?A?] see ADD